Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Knowledge Extraction and Semantic Annotation of Text from the Encyclopedia of Life

Identifieur interne : 000204 ( Main/Exploration ); précédent : 000203; suivant : 000205

Knowledge Extraction and Semantic Annotation of Text from the Encyclopedia of Life

Auteurs : Anne E. Thessen [États-Unis] ; Cynthia Sims Parr [États-Unis]

Source :

RBID : PMC:3940440

Abstract

Numerous digitization and ontological initiatives have focused on translating biological knowledge from narrative text to machine-readable formats. In this paper, we describe two workflows for knowledge extraction and semantic annotation of text data objects featured in an online biodiversity aggregator, the Encyclopedia of Life. One workflow tags text with DBpedia URIs based on keywords. Another workflow finds taxon names in text using GNRD for the purpose of building a species association network. Both workflows work well: the annotation workflow has an F1 Score of 0.941 and the association algorithm has an F1 Score of 0.885. Existing text annotators such as Terminizer and DBpedia Spotlight performed well, but require some optimization to be useful in the ecology and evolution domain. Important future work includes scaling up and improving accuracy through the use of distributional semantics.


Url:
DOI: 10.1371/journal.pone.0089550
PubMed: 24594988
PubMed Central: 3940440


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Knowledge Extraction and Semantic Annotation of Text from the Encyclopedia of Life</title>
<author>
<name sortKey="Thessen, Anne E" sort="Thessen, Anne E" uniqKey="Thessen A" first="Anne E." last="Thessen">Anne E. Thessen</name>
<affiliation wicri:level="2">
<nlm:aff id="aff1">
<addr-line>Arizona State University, School of Life Sciences, Tempe, Arizona, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Arizona State University, School of Life Sciences, Tempe, Arizona</wicri:regionArea>
<placeName>
<region type="state">Arizona</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Parr, Cynthia Sims" sort="Parr, Cynthia Sims" uniqKey="Parr C" first="Cynthia Sims" last="Parr">Cynthia Sims Parr</name>
<affiliation wicri:level="1">
<nlm:aff id="aff2">
<addr-line>National Museum of Natural History, Smithsonian Institution, Washington, District of Columbia, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>National Museum of Natural History, Smithsonian Institution, Washington, District of Columbia</wicri:regionArea>
<wicri:noRegion>District of Columbia</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">24594988</idno>
<idno type="pmc">3940440</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3940440</idno>
<idno type="RBID">PMC:3940440</idno>
<idno type="doi">10.1371/journal.pone.0089550</idno>
<date when="2014">2014</date>
<idno type="wicri:Area/Pmc/Corpus">000629</idno>
<idno type="wicri:Area/Pmc/Curation">000629</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000192</idno>
<idno type="wicri:Area/Ncbi/Merge">000507</idno>
<idno type="wicri:Area/Ncbi/Curation">000507</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">000507</idno>
<idno type="wicri:Area/Main/Merge">000204</idno>
<idno type="wicri:Area/Main/Curation">000204</idno>
<idno type="wicri:Area/Main/Exploration">000204</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Knowledge Extraction and Semantic Annotation of Text from the Encyclopedia of Life</title>
<author>
<name sortKey="Thessen, Anne E" sort="Thessen, Anne E" uniqKey="Thessen A" first="Anne E." last="Thessen">Anne E. Thessen</name>
<affiliation wicri:level="2">
<nlm:aff id="aff1">
<addr-line>Arizona State University, School of Life Sciences, Tempe, Arizona, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Arizona State University, School of Life Sciences, Tempe, Arizona</wicri:regionArea>
<placeName>
<region type="state">Arizona</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Parr, Cynthia Sims" sort="Parr, Cynthia Sims" uniqKey="Parr C" first="Cynthia Sims" last="Parr">Cynthia Sims Parr</name>
<affiliation wicri:level="1">
<nlm:aff id="aff2">
<addr-line>National Museum of Natural History, Smithsonian Institution, Washington, District of Columbia, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>National Museum of Natural History, Smithsonian Institution, Washington, District of Columbia</wicri:regionArea>
<wicri:noRegion>District of Columbia</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j">PLoS ONE</title>
<idno type="eISSN">1932-6203</idno>
<imprint>
<date when="2014">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Numerous digitization and ontological initiatives have focused on translating biological knowledge from narrative text to machine-readable formats. In this paper, we describe two workflows for knowledge extraction and semantic annotation of text data objects featured in an online biodiversity aggregator, the Encyclopedia of Life. One workflow tags text with DBpedia URIs based on keywords. Another workflow finds taxon names in text using GNRD for the purpose of building a species association network. Both workflows work well: the annotation workflow has an F1 Score of 0.941 and the association algorithm has an F1 Score of 0.885. Existing text annotators such as Terminizer and DBpedia Spotlight performed well, but require some optimization to be useful in the ecology and evolution domain. Important future work includes scaling up and improving accuracy through the use of distributional semantics.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Miller, J" uniqKey="Miller J">J Miller</name>
</author>
<author>
<name sortKey="Dikow, T" uniqKey="Dikow T">T Dikow</name>
</author>
<author>
<name sortKey="Agosti, D" uniqKey="Agosti D">D Agosti</name>
</author>
<author>
<name sortKey="Sautter, G" uniqKey="Sautter G">G Sautter</name>
</author>
<author>
<name sortKey="Catapano, T" uniqKey="Catapano T">T Catapano</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Penev, L" uniqKey="Penev L">L Penev</name>
</author>
<author>
<name sortKey="Agosti, D" uniqKey="Agosti D">D Agosti</name>
</author>
<author>
<name sortKey="Georgiev, T" uniqKey="Georgiev T">T Georgiev</name>
</author>
<author>
<name sortKey="Catapano, T" uniqKey="Catapano T">T Catapano</name>
</author>
<author>
<name sortKey="Miller, J" uniqKey="Miller J">J Miller</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Thessen, Ae" uniqKey="Thessen A">AE Thessen</name>
</author>
<author>
<name sortKey="Cui, H" uniqKey="Cui H">H Cui</name>
</author>
<author>
<name sortKey="Mozzherin, D" uniqKey="Mozzherin D">D Mozzherin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cui, H" uniqKey="Cui H">H Cui</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Agosti, D" uniqKey="Agosti D">D Agosti</name>
</author>
<author>
<name sortKey="Egloff, W" uniqKey="Egloff W">W Egloff</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bizer, C" uniqKey="Bizer C">C Bizer</name>
</author>
<author>
<name sortKey="Heath, T" uniqKey="Heath T">T Heath</name>
</author>
<author>
<name sortKey="Berners Lee, T" uniqKey="Berners Lee T">T Berners-Lee</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Deans, Ar" uniqKey="Deans A">AR Deans</name>
</author>
<author>
<name sortKey="Yoder, Mj" uniqKey="Yoder M">MJ Yoder</name>
</author>
<author>
<name sortKey="Balhoff, Jp" uniqKey="Balhoff J">JP Balhoff</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Page, Rdm" uniqKey="Page R">RDM Page</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Page, Rdm" uniqKey="Page R">RDM Page</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Madin, J" uniqKey="Madin J">J Madin</name>
</author>
<author>
<name sortKey="Bowers, S" uniqKey="Bowers S">S Bowers</name>
</author>
<author>
<name sortKey="Schildhauer, Mp" uniqKey="Schildhauer M">MP Schildhauer</name>
</author>
<author>
<name sortKey="Krivov, S" uniqKey="Krivov S">S Krivov</name>
</author>
<author>
<name sortKey="Pennington, D" uniqKey="Pennington D">D Pennington</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Parr, Cs" uniqKey="Parr C">CS Parr</name>
</author>
<author>
<name sortKey="Guralnick, R" uniqKey="Guralnick R">R Guralnick</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Michener, Wk" uniqKey="Michener W">WK Michener</name>
</author>
<author>
<name sortKey="Jones, Mb" uniqKey="Jones M">MB Jones</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Williams, Rj" uniqKey="Williams R">RJ Williams</name>
</author>
<author>
<name sortKey="Martinez, Nd" uniqKey="Martinez N">ND Martinez</name>
</author>
<author>
<name sortKey="Golbeck, J" uniqKey="Golbeck J">J Golbeck</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ananiadou, S" uniqKey="Ananiadou S">S Ananiadou</name>
</author>
<author>
<name sortKey="Kell, Db" uniqKey="Kell D">DB Kell</name>
</author>
<author>
<name sortKey="Tsujii, J" uniqKey="Tsujii J">J Tsujii</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Krallinger, M" uniqKey="Krallinger M">M Krallinger</name>
</author>
<author>
<name sortKey="Valencia, A" uniqKey="Valencia A">A Valencia</name>
</author>
<author>
<name sortKey="Hirschman, L" uniqKey="Hirschman L">L Hirschman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bodenreider, O" uniqKey="Bodenreider O">O Bodenreider</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chen, L" uniqKey="Chen L">L Chen</name>
</author>
<author>
<name sortKey="Liu, H" uniqKey="Liu H">H Liu</name>
</author>
<author>
<name sortKey="Friedman, C" uniqKey="Friedman C">C Friedman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yu, H" uniqKey="Yu H">H Yu</name>
</author>
<author>
<name sortKey="Kim, W" uniqKey="Kim W">W Kim</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chang, Jt" uniqKey="Chang J">JT Chang</name>
</author>
<author>
<name sortKey="Schutze, H" uniqKey="Schutze H">H Schutze</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Clark, T" uniqKey="Clark T">T Clark</name>
</author>
<author>
<name sortKey="Martin, S" uniqKey="Martin S">S Martin</name>
</author>
<author>
<name sortKey="Liefeld, T" uniqKey="Liefeld T">T Liefeld</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wieczorek, J" uniqKey="Wieczorek J">J Wieczorek</name>
</author>
<author>
<name sortKey="Bloom, D" uniqKey="Bloom D">D Bloom</name>
</author>
<author>
<name sortKey="Guralnick, R" uniqKey="Guralnick R">R Guralnick</name>
</author>
<author>
<name sortKey="Blum, S" uniqKey="Blum S">S Blum</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Patterson, Dj" uniqKey="Patterson D">DJ Patterson</name>
</author>
<author>
<name sortKey="Faulwetter, S" uniqKey="Faulwetter S">S Faulwetter</name>
</author>
<author>
<name sortKey="Shipunov, A" uniqKey="Shipunov A">A Shipunov</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rotman, D" uniqKey="Rotman D">D Rotman</name>
</author>
<author>
<name sortKey="Procita, K" uniqKey="Procita K">K Procita</name>
</author>
<author>
<name sortKey="Hansen, D" uniqKey="Hansen D">D Hansen</name>
</author>
<author>
<name sortKey="Sims Parr, C" uniqKey="Sims Parr C">C Sims Parr</name>
</author>
<author>
<name sortKey="Preece, J" uniqKey="Preece J">J Preece</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Leary, Pr" uniqKey="Leary P">PR Leary</name>
</author>
<author>
<name sortKey="Remsen, Dp" uniqKey="Remsen D">DP Remsen</name>
</author>
<author>
<name sortKey="Norton, Cn" uniqKey="Norton C">CN Norton</name>
</author>
<author>
<name sortKey="Patterson, Dj" uniqKey="Patterson D">DJ Patterson</name>
</author>
<author>
<name sortKey="Sarkar, In" uniqKey="Sarkar I">IN Sarkar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Akella, Lm" uniqKey="Akella L">LM Akella</name>
</author>
<author>
<name sortKey="Norton, Cn" uniqKey="Norton C">CN Norton</name>
</author>
<author>
<name sortKey="Miller, H" uniqKey="Miller H">H Miller</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Smoot, Me" uniqKey="Smoot M">ME Smoot</name>
</author>
<author>
<name sortKey="Ono, K" uniqKey="Ono K">K Ono</name>
</author>
<author>
<name sortKey="Ruscheinski, J" uniqKey="Ruscheinski J">J Ruscheinski</name>
</author>
<author>
<name sortKey="Wang, P L" uniqKey="Wang P">P-L Wang</name>
</author>
<author>
<name sortKey="Ideker, T" uniqKey="Ideker T">T Ideker</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fleiss, Jl" uniqKey="Fleiss J">JL Fleiss</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sanchez Graillet, O" uniqKey="Sanchez Graillet O">O Sanchez-Graillet</name>
</author>
<author>
<name sortKey="Poesio, M" uniqKey="Poesio M">M Poesio</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mungall, C" uniqKey="Mungall C">C Mungall</name>
</author>
<author>
<name sortKey="Torniai, C" uniqKey="Torniai C">C Torniai</name>
</author>
<author>
<name sortKey="Gkoutos, G" uniqKey="Gkoutos G">G Gkoutos</name>
</author>
<author>
<name sortKey="Lewis, Se" uniqKey="Lewis S">SE Lewis</name>
</author>
<author>
<name sortKey="Haendel, Ma" uniqKey="Haendel M">MA Haendel</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hancock, D" uniqKey="Hancock D">D Hancock</name>
</author>
<author>
<name sortKey="Morrison, N" uniqKey="Morrison N">N Morrison</name>
</author>
<author>
<name sortKey="Velarde, G" uniqKey="Velarde G">G Velarde</name>
</author>
<author>
<name sortKey="Field, D" uniqKey="Field D">D Field</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Washington, Nl" uniqKey="Washington N">NL Washington</name>
</author>
<author>
<name sortKey="Haendel, Ma" uniqKey="Haendel M">MA Haendel</name>
</author>
<author>
<name sortKey="Mungall, Cj" uniqKey="Mungall C">CJ Mungall</name>
</author>
<author>
<name sortKey="Ashburner, M" uniqKey="Ashburner M">M Ashburner</name>
</author>
<author>
<name sortKey="Westerfield, M" uniqKey="Westerfield M">M Westerfield</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Deans, Ar" uniqKey="Deans A">AR Deans</name>
</author>
<author>
<name sortKey="Kawada, R" uniqKey="Kawada R">R Kawada</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>Arizona</li>
</region>
</list>
<tree>
<country name="États-Unis">
<region name="Arizona">
<name sortKey="Thessen, Anne E" sort="Thessen, Anne E" uniqKey="Thessen A" first="Anne E." last="Thessen">Anne E. Thessen</name>
</region>
<name sortKey="Parr, Cynthia Sims" sort="Parr, Cynthia Sims" uniqKey="Parr C" first="Cynthia Sims" last="Parr">Cynthia Sims Parr</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000204 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000204 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     PMC:3940440
   |texte=   Knowledge Extraction and Semantic Annotation of Text from the Encyclopedia of Life
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:24594988" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024